When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors

نویسندگان

  • Charles Smutz
  • Angelos Stavrou
چکیده

Machine learning classifiers are a vital component of modern malware and intrusion detection systems. However, past studies have shown that classifier based detection systems are susceptible to evasion attacks in practice. Improving the evasion resistance of learning based systems is an open problem. To address this, we introduce a novel method for identifying the observations on which an ensemble classifier performs poorly. During detection, when a sufficient number of votes from individual classifiers disagree, the ensemble classifier prediction is shown to be unreliable. The proposed method, ensemble classifier mutual agreement analysis, allows the detection of many forms of classifier evasion without additional external ground truth. We evaluate our approach using PDFrate, a PDF malware detector. Applying our method to data taken from a real network, we show that the vast majority of predictions can be made with high ensemble classifier agreement. However, most classifier evasion attempts, including nine targeted mimicry scenarios from two recent studies, are given an outcome of uncertain indicating that these observations cannot be given a reliable prediction by the classifier. To show the general applicability of our approach, we tested it against the Drebin Android malware detector where an uncertain prediction was correctly given to the majority of novel attacks. Our evaluation includes over 100,000 PDF documents and 100,000 Android applications. Furthermore, we show that our approach can be generalized to weaken the effectiveness of the Gradient Descent and Kernel Density Estimation attacks against Support Vector Machines. We discovered that feature bagging is the most important property for enabling ensemble classifier diversity based evasion detection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hardening Classifiers against Evasion: the Good, the Bad, and the Ugly

Machine learning is widely used in security applications, particularly in the form of statistical classification aimed at distinguishing benign from malicious entities. Recent research has shown that such classifiers are often vulnerable to evasion attacks, whereby adversaries change behavior to be categorized as benign while preserving malicious functionality. Research into evasion attacks has...

متن کامل

Fault Detection of Bearings Using a Rule-based Classifier Ensemble and Genetic Algorithm

This paper proposes a reduct construction method based on discernibility matrix simplification. The method works with genetic algorithm. To identify potential problems and prevent complete failure of bearings, a new method based on rule-based classifier ensemble is presented. Genetic algorithm is used for feature reduction. The generated rules of the reducts are used to build the candidate base...

متن کامل

Classifier Ensemble Framework: a Diversity Based Approach

Pattern recognition systems are widely used in a host of different fields. Due to some reasons such as lack of knowledge about a method based on which the best classifier is detected for any arbitrary problem, and thanks to significant improvement in accuracy, researchers turn to ensemble methods in almost every task of pattern recognition. Classification as a major task in pattern recognition,...

متن کامل

Discerning Machine Learning Degradation via Ensemble Classifier Mutual Agreement Analysis

Machine learning classifiers are a crucial component of modern malware and intrusion detection systems. However, past studies have shown that classifier-based detection systems are susceptible to evasion attacks in practice. Improving the evasion resistance of learning based systems is an open problem. In this paper, we analyze the effects of mimicry attacks on real-world classifiers. To counte...

متن کامل

Application of ensemble learning techniques to model the atmospheric concentration of SO2

In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016